AITopics | prompt type

Collaborating Authors

prompt type

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

de7b99107c53e60257c727dc73daf1d1-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 10:00:15 GMT

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Orange County > Mission Viejo (0.04)
Asia > China (0.04)

Genre: Research Report (0.46)

Industry:

Information Technology > Security & Privacy (0.94)
Law (0.93)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
Information Technology > Security & Privacy (0.94)

Add feedback

4a70f0c8443593ca59a88ebc8a937ed6-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-12-2026, 13:27:21 GMT

accuracy, annotator, tallyqa, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Evaluating Numerical Reasoning in Text-to-Image Models

Neural Information Processing SystemsFeb-12-2026, 13:27:16 GMT

Text-to-image generative models are capable of producing high-quality images that often faithfully depict concepts described using natural language.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.30)

Add feedback

A Comparison of Human and ChatGPT Classification Performance on Complex Social Media Data

Green, Breanna E., Shea, Ashley L., Zhao, Pengfei, Margolin, Drew B.

arXiv.org Artificial IntelligenceDec-2-2025

Generative artificial intelligence tools, like ChatGPT, are an increasingly utilized resource among computational social scientists. Nevertheless, there remains space for improved understanding of the performance of ChatGPT in complex tasks such as classifying and annotating datasets containing nuanced language. Method. In this paper, we measure the performance of GPT-4 on one such task and compare results to human annotators. We investigate ChatGPT versions 3.5, 4, and 4o to examine performance given rapid changes in technological advancement of large language models. We craft four prompt styles as input and evaluate precision, recall, and F1 scores. Both quantitative and qualitative evaluations of results demonstrate that while including label definitions in prompts may help performance, overall GPT-4 has difficulty classifying nuanced language. Qualitative analysis reveals four specific findings. Our results suggest the use of ChatGPT in classification tasks involving nuanced language should be conducted with prudence.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.00673

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

AssurAI: Experience with Constructing Korean Socio-cultural Datasets to Discover Potential Risks of Generative AI

Lim, Chae-Gyun, Han, Seung-Ho, Byun, EunYoung, Han, Jeongyun, Cho, Soohyun, Joo, Eojin, Kim, Heehyeon, Kim, Sieun, Lee, Juhoon, Lee, Hyunsoo, Lee, Dongkun, Hyeon, Jonghwan, Hwang, Yechan, Lee, Young-Jun, Lee, Kyeongryul, An, Minhyeong, Ahn, Hyunjun, Son, Jeongwoo, Park, Junho, Yoon, Donggyu, Kim, Taehyung, Kim, Jeemin, Choi, Dasom, Lee, Kwangyoung, Lim, Hyunseung, Jung, Yeohyun, Hong, Jongok, Nam, Sooyohn, Park, Joonyoung, Na, Sungmin, Choi, Yubin, Choi, Jeanne, Hong, Yoojin, Jang, Sueun, Seo, Youngseok, Park, Somin, Jo, Seoungung, Chae, Wonhye, Jo, Yeeun, Kim, Eunyoung, Whang, Joyce Jiyoung, Hong, HwaJung, Seering, Joseph, Lee, Uichin, Kim, Juho, Choi, Sunna, Ko, Seokyeon, Kim, Taeho, Kim, Kyunghoon, Ha, Myungsik, Lee, So Jung, Hwang, Jemin, Kwak, JoonHo, Choi, Ho-Jin

arXiv.org Artificial IntelligenceNov-27-2025

The rapid evolution of generative AI necessitates robust safety evaluations. However, current safety datasets are predominantly English-centric, failing to capture specific risks in non-English, socio-cultural contexts such as Korean, and are often limited to the text modality. To address this gap, we introduce AssurAI, a new quality-controlled Korean multimodal dataset for evaluating the safety of generative AI. First, we define a taxonomy of 35 distinct AI risk factors, adapted from established frameworks by a multidisciplinary expert group to cover both universal harms and relevance to the Korean socio-cultural context. Second, leveraging this taxonomy, we construct and release AssurAI, a large-scale Korean multimodal dataset comprising 11,480 instances across text, image, video, and audio. Third, we apply the rigorous quality control process used to ensure data integrity, featuring a two-phase construction (i.e., expert-led seeding and crowdsourced scaling), triple independent annotation, and an iterative expert red-teaming loop. Our pilot study validates AssurAI's effectiveness in assessing the safety of recent LLMs. We release AssurAI to the public to facilitate the development of safer and more reliable generative AI systems for the Korean community.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.20686

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.68)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
Law > Criminal Law (0.68)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Creativity Benchmark: A benchmark for marketing creativity for large language models

Bhat, Ninad, Browne, Kieran, Bingemann, Pip

arXiv.org Artificial IntelligenceOct-21-2025

We introduce Creativity Benchmark, an evaluation framework for large language models (LLMs) in marketing creativity. The benchmark covers 100 brands (12 categories) and three prompt types (Insights, Ideas, Wild Ideas). Human pairwise preferences from 678 practising creatives over 11,012 anonymised comparisons, analysed with Bradley-Terry models, show tightly clustered performance with no model dominating across brands or prompt types: the top-bottom spread is $Δθ\approx 0.45$, which implies a head-to-head win probability of $0.61$; the highest-rated model beats the lowest only about $61\%$ of the time. We also analyse model diversity using cosine distances to capture intra- and inter-model variation and sensitivity to prompt reframing. Comparing three LLM-as-judge setups with human rankings reveals weak, inconsistent correlations and judge-specific biases, underscoring that automated judges cannot substitute for human evaluation. Conventional creativity tests also transfer only partially to brand-constrained tasks. Overall, the results highlight the need for expert human evaluation and diversity-aware workflows.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.09702

Country: Europe > Austria (0.28)

Genre: Research Report > Promising Solution (0.92)

Industry:

Leisure & Entertainment (1.00)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.92)
Information Technology (0.92)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating Prompting Strategies and Large Language Models in Systematic Literature Review Screening: Relevance and Task-Stage Classification

Han, Binglan, Mathrani, Anuradha, Susnjak, Teo

arXiv.org Artificial IntelligenceOct-21-2025

This study quantifies how prompting strategies interact with large language models (LLMs) to automate the screening stage of systematic literature reviews (SLRs). We evaluate six LLMs (GPT-4o, GPT-4o-mini, DeepSeek-Chat-V3, Gemini-2.5-Flash, Claude-3.5-Haiku, Llama-4-Maverick) under five prompt types (zero-shot, few-shot, chain-of-thought (CoT), CoT-few-shot, self-reflection) across relevance classification and six Level-2 tasks, using accuracy, precision, recall, and F1. Results show pronounced model-prompt interaction effects: CoT-few-shot yields the most reliable precision-recall balance; zero-shot maximizes recall for high-sensitivity passes; and self-reflection underperforms due to over-inclusivity and instability across models. GPT-4o and DeepSeek provide robust overall performance, while GPT-4o-mini performs competitively at a substantially lower dollar cost. A cost-performance analysis for relevance classification (per 1,000 abstracts) reveals large absolute differences among model-prompt pairings; GPT-4o-mini remains low-cost across prompts, and structured prompts (CoT/CoT-few-shot) on GPT-4o-mini offer attractive F1 at a small incremental cost. We recommend a staged workflow that (1) deploys low-cost models with structured prompts for first-pass screening and (2) escalates only borderline cases to higher-capacity models. These findings highlight LLMs' uneven but promising potential to automate literature screening. By systematically analyzing prompt-model interactions, we provide a comparative benchmark and practical guidance for task-adaptive LLM deployment.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.16091

Genre: